Animals run robustly in diverse terrain. This locomotion robustness is puzzling because axon conduction velocity is limited to a few ten meters per second. If reflex loops deliver sensory information with significant delays, one would expect a destabilizing effect on sensorimotor control. Hence, an alternative explanation describes a hierarchical structure of low-level adaptive mechanics and high-level sensorimotor control to help mitigate the effects of transmission delays. Motivated by the concept of an adaptive mechanism triggering an immediate response, we developed a tunable physical damper system. Our mechanism combines a tendon with adjustable slackness connected to a physical damper. The slack damper allows adjustment of damping force, onset timing, effective stroke, and energy dissipation. We characterize the slack damper mechanism mounted to a legged robot controlled in open-loop mode. The robot hops vertically and planar over varying terrains and perturbations. During forward hopping, slack-based damping improves faster perturbation recovery (up to 170%) at higher energetic cost (27%). The tunable slack mechanism auto-engages the damper during perturbations, leading to a perturbation-trigger damping, improving robustness at minimum energetic cost. With the results from the slack damper mechanism, we propose a new functional interpretation of animals' redundant muscle tendons as tunable dampers.
translated by 谷歌翻译
When reading a story, humans can rapidly understand new fictional characters with a few observations, mainly by drawing analogy to fictional and real people they met before in their lives. This reflects the few-shot and meta-learning essence of humans' inference of characters' mental states, i.e., humans' theory-of-mind (ToM), which is largely ignored in existing research. We fill this gap with a novel NLP benchmark, TOM-IN-AMC, the first assessment of models' ability of meta-learning of ToM in a realistic narrative understanding scenario. Our benchmark consists of $\sim$1,000 parsed movie scripts for this purpose, each corresponding to a few-shot character understanding task; and requires models to mimic humans' ability of fast digesting characters with a few starting scenes in a new movie. Our human study verified that humans can solve our problem by inferring characters' mental states based on their previously seen movies; while the state-of-the-art metric-learning and meta-learning approaches adapted to our task lags 30% behind.
translated by 谷歌翻译
大多数腿部机器人都是由串行安装链路和执行器的腿部结构构建的,并通过复杂的控制器和传感器反馈来控制。相比之下,动物发展了多段腿,关节之间的机械耦合以及多段的脚。它们在所有地形上运行敏捷,可以说是通过更简单的运动控制。在这里,我们专注于开发抗原在自然地形上也滑落和下沉的脚步机制。我们提出了安装在具有多接头机械肌腱耦合的鸟类灵感机器人腿上的多段脚的首先结果。我们的单段和两段机械自适应的脚显示在开始滑动之前,在多个软和硬质基材上显示了可行的水平力。我们还观察到,与球形和圆柱 - 脚相比,分割的脚减少了软底物上的下沉。我们报告了多段脚如何提供非常适合双皮亚机器人的可行压力点的范围范围,还适用于斜坡和自然地形上的四倍机器人。我们的结果还提供了对诸如级别鸟类等动物的分段脚的功能理解。
translated by 谷歌翻译
人类的腿部运动受人体和神经控制的自然动态的控制。假定有助于人类行走效率高的一种机制是冲动的脚踝推断,它可能为挥杆腿弹射器提供动力。然而,尚不清楚人类下腿的机制,其复杂的肌肉弯曲单元跨越了单个关节和多个关节。腿部机器人允许在实际步行步态中测试复杂的腿力学,控制和环境之间的相互作用。我们开发了一个高0.49m,2.2千克的拟人化型双足机器人,带有比目鱼和甲壳虫肌肉弯曲单元,由线性弹簧代表,在机器人的踝关节和膝关节周围充当单型和二子弹性结构。我们测试了三个比目鱼和胃弹簧螺旋形构型对踝关节功率曲线的影响,踝关节和膝关节运动的协调,总运输成本和步行速度。我们用前馈中央模式发生器控制了机器人,在1.0Hz运动频率下,步行速度在0.35m/s和0.57m/s之间,腿长为0.35m。我们发现所有三种配置之间的差异。比目鱼弹簧刺刺调节机器人的速度和能量效率可能是通过踝关节放大的,而胃刺的弹簧螺旋体在推下时改变了脚踝和膝关节之间的运动配位。
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译
Existing automated techniques for software documentation typically attempt to reason between two main sources of information: code and natural language. However, this reasoning process is often complicated by the lexical gap between more abstract natural language and more structured programming languages. One potential bridge for this gap is the Graphical User Interface (GUI), as GUIs inherently encode salient information about underlying program functionality into rich, pixel-based data representations. This paper offers one of the first comprehensive empirical investigations into the connection between GUIs and functional, natural language descriptions of software. First, we collect, analyze, and open source a large dataset of functional GUI descriptions consisting of 45,998 descriptions for 10,204 screenshots from popular Android applications. The descriptions were obtained from human labelers and underwent several quality control mechanisms. To gain insight into the representational potential of GUIs, we investigate the ability of four Neural Image Captioning models to predict natural language descriptions of varying granularity when provided a screenshot as input. We evaluate these models quantitatively, using common machine translation metrics, and qualitatively through a large-scale user study. Finally, we offer learned lessons and a discussion of the potential shown by multimodal models to enhance future techniques for automated software documentation.
translated by 谷歌翻译
Text clustering and topic extraction are two important tasks in text mining. Usually, these two tasks are performed separately. For topic extraction to facilitate clustering, we can first project texts into a topic space and then perform a clustering algorithm to obtain clusters. To promote topic extraction by clustering, we can first obtain clusters with a clustering algorithm and then extract cluster-specific topics. However, this naive strategy ignores the fact that text clustering and topic extraction are strongly correlated and follow a chicken-and-egg relationship. Performing them separately fails to make them mutually benefit each other to achieve the best overall performance. In this paper, we propose an unsupervised text clustering and topic extraction framework (ClusTop) which integrates text clustering and topic extraction into a unified framework and can achieve high-quality clustering result and extract topics from each cluster simultaneously. Our framework includes four components: enhanced language model training, dimensionality reduction, clustering and topic extraction, where the enhanced language model can be viewed as a bridge between clustering and topic extraction. On one hand, it provides text embeddings with a strong cluster structure which facilitates effective text clustering; on the other hand, it pays high attention on the topic related words for topic extraction because of its self-attention architecture. Moreover, the training of enhanced language model is unsupervised. Experiments on two datasets demonstrate the effectiveness of our framework and provide benchmarks for different model combinations in this framework.
translated by 谷歌翻译
Cognitive Computing (COC) aims to build highly cognitive machines with low computational resources that respond in real-time. However, scholarly literature shows varying research areas and various interpretations of COC. This calls for a cohesive architecture that delineates the nature of COC. We argue that if Herbert Simon considered the design science is the science of artificial, cognitive systems are the products of cognitive science or 'the newest science of the artificial'. Therefore, building a conceptual basis for COC is an essential step into prospective cognitive computing-based systems. This paper proposes an architecture of COC through analyzing the literature on COC using a myriad of statistical analysis methods. Then, we compare the statistical analysis results with previous qualitative analysis results to confirm our findings. The study also comprehensively surveys the recent research on COC to identify the state of the art and connect the advances in varied research disciplines in COC. The study found that there are three underlaying computing paradigms, Von-Neuman, Neuromorphic Engineering and Quantum Computing, that comprehensively complement the structure of cognitive computation. The research discuss possible applications and open research directions under the COC umbrella.
translated by 谷歌翻译
Reading comprehension of legal text can be a particularly challenging task due to the length and complexity of legal clauses and a shortage of expert-annotated datasets. To address this challenge, we introduce the Merger Agreement Understanding Dataset (MAUD), an expert-annotated reading comprehension dataset based on the American Bar Association's 2021 Public Target Deal Points Study, with over 39,000 examples and over 47,000 total annotations. Our fine-tuned Transformer baselines show promising results, with models performing well above random on most questions. However, on a large subset of questions, there is still room for significant improvement. As the only expert-annotated merger agreement dataset, MAUD is valuable as a benchmark for both the legal profession and the NLP community.
translated by 谷歌翻译
Robotic teleoperation is a key technology for a wide variety of applications. It allows sending robots instead of humans in remote, possibly dangerous locations while still using the human brain with its enormous knowledge and creativity, especially for solving unexpected problems. A main challenge in teleoperation consists of providing enough feedback to the human operator for situation awareness and thus create full immersion, as well as offering the operator suitable control interfaces to achieve efficient and robust task fulfillment. We present a bimanual telemanipulation system consisting of an anthropomorphic avatar robot and an operator station providing force and haptic feedback to the human operator. The avatar arms are controlled in Cartesian space with a direct mapping of the operator movements. The measured forces and torques on the avatar side are haptically displayed to the operator. We developed a predictive avatar model for limit avoidance which runs on the operator side, ensuring low latency. The system was successfully evaluated during the ANA Avatar XPRIZE competition semifinals. In addition, we performed in lab experiments and carried out a small user study with mostly untrained operators.
translated by 谷歌翻译